Dependency structure analysis and sentence boundary detection in spontaneous Japanese
نویسندگان
چکیده
This paper addresses automatic detection of dependencies between Japanese phrasal units called bunsetsus, and sentence boundaries in a spontaneous speech corpus. In spontaneous speech, the biggest problem with dependency structure analysis is that sentence boundaries are ambiguous. In this paper, we propose two methods for improving the accuracy of sentence boundary detection in spontaneous Japanese: one based on unsupervised learning and the other based on machine learning. Experimental results show that the sentence boundary detection accuracy of 84.85 in F-measure is achieved by using the proposed methods and the accuracy of dependency structure analysis is also improved by using the information on automatically detected sentence boundaries.
منابع مشابه
Dependency-structure Annotation to Corpus of Spontaneous Japanese
In Japanese, syntactic structure of a sentence is generally represented by the relationship between phrasal units, or bunsetsus in Japanese, based on a dependency grammar. In the same way, the syntactic structure of a sentence in a large, spontaneous, Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ), is represented by dependency relationships between bunsetsus. This paper descri...
متن کاملWord-level Dependency-structure Annotation to Corpus of Spontaneous Japanese and its Application
In Japanese, the syntactic structure of a sentence is generally represented by the relationship between phrasal units, bunsetsus in Japanese, based on a dependency grammar. In many cases, the syntactic structure of a bunsetsu is not considered in syntactic structure annotation. This paper gives the criteria and definitions of dependency relationships between words in a bunsetsu and their applic...
متن کاملDependency parsing of Japanese spoken monologue based on clause-starts detection
A dependency parsing method based on sentence segmentation into clauses has been proposed and confirmed to be effective. In this method, dependency parsing is executed in two stages: at the clause level and the sentence level. However, since a sentence can not be segmented into complete clauses, in the past research, a unit sandwiched between two clause-end boundaries (clause boundary unit) was...
متن کاملSentence boundary detection of spontaneous Japanese using statistical language model and support vector machines
This paper presents two different approaches utilizing statistical language model (SLM) and support vector machines (SVM) for sentence boundary detection of spontaneous Japanese. In the SLM-based approach, linguistic likelihoods and occurrence of pause are used to determine sentence boundaries. To suppress false alarms, heuristic patterns of end-of-sentence expressions are also incorporated. On...
متن کاملSentence boundary detection using sequential dependency analysis combined with CRF-based chunking
In spoken language, sentence boundaries are much less explicit than in written language. Since conventional natural language processing (NLP) techniques are generally designed assuming the sentence boundaries are already given, it is crucial to detect the boundaries accurately for applying such NLP techniques to spoken language. Classification frameworks, such as Support Vector Machines (SVMs) ...
متن کامل